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Abstract: Influenza A virus (HlNl), which arose in 2009, constituted the fourth pandemic after the cases of 1918, 1957, 
and 1968. This new variant was formed by a triple reassortment, with genomic segments from swine, avian, and human 
influenza origins. The objective of this study was to analyze sequences of hemagglutinin (n=2038) and neuraminidase 
(n=1273) genes, in order to assess the extent of diversity among circulating 2009-2010 strains, estimate if these genes 
evolved through positive, negative, or neutral selection models of evolution during the pandemic phase, and analyze the 
worldwide percentage of detection of important amino acid mutations that could enhance the viral performance, such as 
transmissibility or resistance to drugs. A continuous surveillance by public health authorities will be critical to monitor the 
appearance of new influenza variants, especially in animal reservoirs such as swine and birds, in order to prevent the 
potential animal-human transmission of viruses with pandemic potential. 
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INTRODUCTION 

Influenza A viruses belong to the Orthomyxoviridae 
family, and have a genome composed of eight segments of 
single-stranded, negative-sense RNA. Their surfaces are 
composed by a lipid envelope, originated from the plasmatic 
membrane of infected epithelial cells, and two antigenic 
proteins: Hemagglutinin (HA) and Neuraminidase (NA); 
these two antigens exhibit higher variability compared with 
their remaining proteins [1]. Depending on the extent of 
variability of two surface proteins, until now are known 16 
HA (HI -HI 6), and 9 NA genotypes (N1-N9), respectively, 
which can be combined in different combinations [1, 2]. 

In early April 2009, authorities from the Mexican public 
health observed a high number of influenza-like illnesses in 
their territory, and informed about this outbreak to the 
regional office of the World Health Organization (WHO). In 
mid April, the Centers for Disease Control from USA 
identified the new virus in two cases from California. The 
new virus spread rapidly throughout the world, and as a 
consequence the WHO authorities declared the "Pandemic 
(HlNl) 2009" on June 11, 2009 [3]. It is thought that the 
new 2009 HlNl pandemic virus (from here, 2009 
HlNlpdm) has emerged through at least four reassortment 
and transmission events among swine, avian and human 
HlNl lineages, probably in Asia and North America [4]. 
Particularly, the HA segment of 2009 HlNlpdm was 
originated from American swine lineage, whereas the NA 
segment derived from the European swine lineage [5, 6]. It is 
believed that the ancestors of this pandemic strain remained 
undetected for approximately one decade due to lack of a 
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surveillance system in pigs, the historical "mixing vessel" 
for new influenza viruses. Furthermore, the closest ancestors 
of the new pandemic strains emerged probably in January 
2009 [4]. 

The objective of this study was to analyze a dataset of 
complete nucleotide (nt) sequences of HA and NA genes, in 
order to assess the extent of diversity among circulating 
2009-2010 strains, estimate if these genes evolved through 
positive, negative, or neutral selection models of evolution 
during the pandemic phase, and analyze the worldwide 
percentage of detection of important amino acid mutations 
that could enhance the viral performance, such as 
transmissibility or resistance to drugs. 

Cornplete CoDing Sequences (CDS) of HA (1701 nt) and 
NA (1410 nt) genes corresponding to 2009 HlNlpdm, 
isolated from humans, were downloaded from the Influenza 
Virus Resource (http://www.ncbi.nlm.nih.gov/genomes/FLU 
/SwineFlu.html) from the National Center for Biotechnology 
Information, by the year of sequence repository. The first 
dataset consisted of 3765 HA and 2996 NA sequences, 
respectively, which were reported in the period 2009-2010. 
After discarding exact duphcates in sequence using a Perl 
script, we obtained 2038 HA and 1273 NA sequences, 
respectively; these sequences were different in at least one 
nucleotide among all representatives. Reassortant strains 
were discarded, as well as incomplete CDS sequences. 

Nucleotide sequences were manually edited in FASTA 
format, using BioEdit v7.0.5 [7], and aHgned with 
CLUSTAL W [8]. Sequence information (GenBank 
accession number, strain, and year of isolation) for each 
sample used in this study are available for HA (Table SI) 
and NA genes (Table S2), respectively. Pairwise distances 
were calculated with MEGA v5 [9]. The percentages of 
identities were calculated by applying the formula 100 - 
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(pairwise distance value x 100). A graph was constructed by 
plotting the percentage identities in the abscissa (x axis) vs 
the frequency of each of the calculated pairwise identities in 
the ordinate (y axis). The graphs were prepared in the R 
environment, using ggplot2 package (www.r-project.org). 

The models of nucleotide substitution that best fitted 
each dataset were determined with MEGA v5 [9], and were: 
GTR+I model for HA genes, and T92+G model for NA 
genes, respectively. Phylogenetic relationships were 
reconstructed by the Neighbor- Joining method [10], with the 
appropriate models of nucleotide substitution for each 
dataset (as described above) and bootstrap analysis of 1000 
rephcates, as incorporated in MEGA v5 [9]. Outgroup 
sequences for HA and NA genes corresponded to strain 
A/Puerto Rico/8/1934. 

Mutations in each CDS were analyzed by the method of 
Nei and Gojobori [11]. Codon aligned sequences for each 
dataset were analyzed using the Perl-based SNAP program 
(http://www.hiv.lanl.gov/content/sequence/SNAP/SNAP.htm 
1) [12] in order to calculate the variability of each CDS. The 
selective pressure was measured by comparing the rate of 
non- synonymous nucleotide substitutions per non- 
synonymous site (d^) against that of synonymous 
substitutions per synonymous site (ds). The ratio c/n/^^^s was 
used as an index to assess positive selection. A ratio dj^j/ds >1 
means positive (diversifying) selection, =1 means neutral 
selection, and <1 means negative (purifying) selection. 

The analysis of pairwise identity frequencies showed 
high percentage of similarities among circulating 2009-2010 
pandemic influenza strains (Fig. 1). The average percentage 
of identity was 99.7% for both HA and NA genes. Thus, in 
this period of pandemic circulation, both genes did not 
segregate into different clusters, but on the contrary showed 
a constant and stable evolution. 



The high percentages of nucleotide identity were in 
accordance with the single clustering of all 2009-2010 
strains in the phylogenetic tree of HA and NA genes (Fig. 2), 
without temporal or geographical distribution. It is 
interesting to note that the overall genetic diversity among 
2009 HlNlpdm was less than typically observed among 
seasonal influenza. This is in accordance with its short 
period of time of circulation in humans [13]. The single 
clustering of 2009 HlNlpdm observed in this report, 
however, is in contrast with other studies [14], in which the 
authors observe differences by using small datasets of 
sequences. The single clustering of 2009 HlNlpdm, 
furthermore, agrees with serological data in which it was 
observed that antigenically, the new pandemic viruses were 
all similar [6], and thus not requiring a new update of the 
vaccine (strain A/California/07/2009) until now. 

Given that 2009 HlNlpdm constituted a homogeneous 
phylogenetic group, it was hypothesized that the diversity in 
nucleotide sequences locaUzed (in average) in the 0.3% of 
differences within each analyzed gene. Taking into account 
the complete CDS for HA and NA genes, this percentage of 
differences constitutes approximately four to five nucleotide 
random variations among circulating strains. 

Calculation of average c^n/^s rates of evolution showed 
that both HA and NA genes evolved through negative 
(purifying) selection (Table 1), with d^/ds values of 0.2762 
and 0.1939, respectively. Even though in general, both genes 
underwent negative selection, some positions can evolve 
through positive selection. For example, an early study 
showed that two sites involved in receptor binding 
specificity of HA (220 and 278) were under positive 
selection, and these sites were not found in swine or seasonal 
HlNl viruses [15]. Thus, changes in receptor binding sites 
could lead to alterations in receptor binding specificities. 

In other viruses such as SARS-CoV, it was observed that 
they can develop through positive selection through the 
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Fig. (1). Pairwise identity frequencies for (a) Hemagglutinin and (b) Neuraminidase genes, respectively. 
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Fig. (2). Phylogenetic reconstruction for (a) Hemagglutinin and (b) Neuraminidase genes, respectively. The vertical black lines indicate the 
monophyletic clustering of complete nucleotide CDS for both genes. Branch distances are indicated by a scale bar (0.02 nt substitution per 
site) at the bottom of each tree. 



cross-species transmission in early epidemics, and negative 
selection during late epidemics [16]. It is possible that the 
same mechanism was the driven force of evolution of 2009 



HlNlpdm, with positive selection at least during cross- 
species transmission. 
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A number of different amino acid mutations that could 
confer new functionalities to the new virus were reported 
worldwide, including those related to increased 
pathogenicity or antiviral resistance (Table 2). 

Table 1. Nucleotide Variation for HA and NA Genes 



Gene 


Average ds 


Average dn 


dfi/ds 


Evolutionary 
Selection 


HA 


0.0105 


0.0029 


0.2762 


Negative 


NA 


0.0105 


0.0022 


0.1930 


Negative 



Table 2. Mutations in HA and NA Genes of 2009 HlNlpdm. 

Numbering Corresponds to the Pandemic Prototype 
and Vaccine Strain, A/California/07/2009 



Hemagglutinin 


Nueraminidase 


Mutation 


Percentage 


Mutation 


Percentage 


SIOIN 


0.2% 


V106I 


85.1% 


S220T 


76.7% 


D199N 


0.3% 


D239E 


5.5% 


I223R 


0.2% 


D239G 


2.6% 


N248D 


85.9% 


Q310H 


4.3% 


H275Y 


2.0% 


N387H 


1 .7% 






E391K 


15.6% 







Polymorphism at position 239 in HA has been associated 
with severe clinical outcomes, especially in immunocompro- 
mised patients; in particular, substitution 23 9G was found to 
correlate with fatal outcomes in different countries [17, 18]. 
Furthermore, this mutation can arise de novo from wildtj^e 
(D239) virus in the same patient throughout the disease 
course [19]. Mutation at position 239 can induce alterations 
in the receptor binding site, and 239G mutants bind a 
broader range of a2-3-linked sialyl receptors sequences 
expressed on cells from the lower respiratory tract, which 
suggested that its presence could be responsible for the 
exacerbation of disease [20]. Mutants 239E target mainly 
non-ciliated cells. We found no significant difference 
between sequences bearing mutations 239G (2.6%) and 
239E (5.5%). The low percentage of global circulation of 
mutants 23 9G found in this study is in accordance with its 
lower potential to transmit to other individuals [21]. 

Positions 239 and 220 are localized within the HA 
antigenic site called Ca. The amino acid S220, though not 
exposed to the surface, is localized in the receptor binding 
domain (RBD), and its change could affect the 
transmissibility and infectivity of HlNl in humans. The 
fixed mutation, S220T, has been found at high percentage 
(76.7%) in this study. To test whether change 220T could 
contribute to antigenic drift, it would be interesting to 
compare its antigenic profile against a wildtype isolate 
(S220). This mutation, probably, has become fixed in all 
pandemic strains through optimization of viral fitness, rather 
than immune selection or adaptation to the host. 



Substitution SIOIN has been proposed previously as a 
reversion to the seasonal HlNl residue 10 IN and thus 
possibly an adaptation to the human host, being found in 
some studies at high frequencies. Its global impact, however, 
is controvertible because it was found in only 0.2% of our 
sequences. Substitution E391K, found at 15.6% in our study, 
has been identified as part of a highly conserved epitope in 
the 1918 HlNl virus with a possible role in membrane 
fusion [22]. Another proximal substitution found in other 
studies, N387H, was found in only 1.7% of our sequences. 

In the NA gene, it was showed that mutations VI 061 and 
N248D were present in samples at increasing numbers 
through early pandemic month (April to December 2009) 
[23]. We found both mutations at high percentages, 85.1% 
and 85.9% respectively, in our dataset. Change 1061 was 
present in the 20th century cases of HlNl (in 1918 
[pandemic], and 1977), as well as 248D (in 1977). Since 
residue at position 248 is located at the drug target domain 
(DTD) region, as residue 275, it could potentially affect the 
sensitivity to NA inhibitors. Another substitution of possible 
interest in NA sequences is D199N, which was previously 
associated with an increase in oseltamivir resistance in both 
seasonal and H5N1 virus strains [24]. We found, however, 
only 4 out of 1273 NA sequences (0.3%) containing this 
change. The rare substitution I223R, which was reported in 
association with resistance to oseltamivir, zanamivir, and 
peramivir [25], was also found in only 2 out of 1273 NA 
sequences (0.2%). Substitution H275Y has been related to 
oseltamivir resistance, especially in immunocompromised or 
severely ill persons [26]. It was found, however, in sporadic 
cases in most of the countries at low frequencies (-1%) [27]. 
In our study, we found 2% of sequences containing this 
change. 

In conclusion, the stable evolution of 2009 HlNlpdm 
offers an opportunity to control its spread and prevent 
infections. Reports about new mutations, however, will still 
be important if those changes can confer an enhanced 
transmissibility or resistance to drugs. Furthermore, a 
continuous surveillance by public health authorities will be 
critical to monitor the appearance of new influenza variants, 
especially in animal reservoirs such as swine and birds, in 
order to prevent the potential animal-human transmission of 
viruses with pandemic potential. 
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